Econometrics - Lecture 5
Multiple Linear Regression
1 Introduction
- In this lecture, we extend the simple linear regression framework to include multiple predictors.
- This approach allows us to understand the impact of each independent variable on the dependent variable while controlling for other factors.
- By adding more predictors, we can better capture the complexity of real-world relationships and reduce omitted variable bias.
2 Key Concepts
- Adding More Predictors
- Partial Effects (Controlling for Other Variables)
- Interpretation Nuances (Coefficients in Multiple Regression)
- Model Selection & Fit Criteria
3 Theoretical Discussion
A multiple linear regression model with two predictors can be written as:
\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \varepsilon \]
- \(y\): The dependent variable
- \(x_1, x_2, \dots\): Independent variables (predictors)
- \(\beta_0\): The intercept, representing the expected value of \(y\) when all predictors are zero
- \(\beta_1, \beta_2, \dots\): Slope coefficients, each indicating how \(y\) changes with a one-unit increase in the corresponding predictor, holding other variables constant
- \(\varepsilon\): The error term, assumed to have a mean of zero if the model is correctly specified
3.1 Partial Effects and Controlling for Other Variables
- Partial Effect: In a multiple regression, \(\beta_1\) represents the expected change in \(y\) for a one-unit increase in \(x_1\), holding \(x_2, x_3, \dots\) constant.
- Mathematically, each coefficient is the change in y divided by the chnge in \(x_j\), i.e. \[\beta_j = \frac{\partial y}{\partial x_j}\]
- Importance of Controlling: By including additional predictors, we can isolate the effect of each variable and reduce omitted variable bias.
3.2 Interpretation Nuances
- Coefficient Magnitude: Each \(\beta_j\) shows how \(y\) changes with \(x_j\), keeping other predictors fixed.
- Significance and p-values: Determine if \(\beta_j\) is significantly different from zero.
- Multicollinearity: Highly correlated predictors can inflate standard errors, complicating interpretation.
- Model Fit: Metrics like Adjusted R-squared become more relevant when comparing models with different numbers of predictors.
Model selection involves determining which predictors to include in a multiple regression model, balancing model performance with interpretability. Below is an overview of common techniques and considerations for selecting and evaluating models.
3.3 Forward, Backward, Stepwise Selection
- Forward. Begin with no predictors, adding them one at a time based on a selection criterion (for instance, a p-value threshold or an information criterion).
- Backward. Start with all candidate predictors, removing them one at a time, typically removing the least significant predictor at each step.
- Stepwise. Combine forward and backward methods by iteratively adding or removing predictors, attempting to find an optimal subset.
3.4 Pro’s and Con’s
- Pros of these approaches include the automated narrowing of large predictor sets.
- However, can be sensitive to the order in which variables enter or leave the model
- potentially result in overfitting.
3.5 Model Selection & Fit Criteria
Model selection involves determining which predictors to include in a multiple regression model, balancing model performance with interpretability. Below is an overview of common techniques and considerations for selecting and evaluating models.
Forward, Backward, Stepwise Selection
Forward: Begin with no predictors, adding them one at a time based on a selection criterion such as a p-value threshold or an information criterion like AIC.
Backward: Start with all candidate predictors, removing them one at a time, typically removing the least significant predictor at each step.
Stepwise: Combine forward and backward methods by iteratively adding or removing predictors, attempting to find an optimal subset.
Pros: Automated procedure, helps narrow down large sets of predictors.
Cons: Can overlook important predictors or retain irrelevant ones, sensitive to the order of entry or removal, may lead to overfitting.
AIC and BIC (Information Criteria)
Information criteria provide a way to compare model fit while penalizing excessive complexity. They are based on the log-likelihood function of the model.
The Log-Likelihood Function for a multiple linear regression model assuming normally distributed errors is:
\[ \ln(L) = -\frac{n}{2} \ln(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]
where
- \(n\) is the number of observations,
- \(\sigma^2\) is the variance of the error term,
- \(\hat{y}_i\) is the predicted value of \(y_i\).
The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are defined as:
\[ \text{AIC} = 2k - 2\ln(\hat{L}) \]
\[ \text{BIC} = k\ln(n) - 2\ln(\hat{L}) \]
where
- \(k\) is the number of parameters in the model,
- \(\ln(\hat{L})\) is the maximized value of the log likelihood function for the model,
- \(n\) is the sample size.
- \(n\) is the number of observations,
Use Cases:
AIC is generally preferred when the primary goal is to selecting a structural model. It balances model fit with complexity but imposes a lighter penalty for additional parameters, making it suitable for scenarios where overfitting is less of a concern than omited variable bias.
BIC is often favored when the goal is prediction. It imposes a heavier penalty for additional parameters, which can be advantageous in avoiding overfitting and selecting simpler models.
Comparing Models: When comparing models that predict the same outcome variable, a difference of about 2–7 points in AIC or BIC indicates moderate evidence in favor of the model with the lower score.
Practical Tips
Do not rely solely on p-values. Incorporate theoretical considerations, domain expertise, and additional metrics such as adjusted R-squared, AIC, or BIC.
Check diagnostics. Even models that look good on paper can fail if they violate OLS assumptions such as linearity or homoscedasticity.
Aim for parsimony. Strive for a balance between simplicity and explanatory power to avoid overfitting and to keep the model interpretable.
4 Case Study
We will perform a multiple regression analysis using the mtcars dataset to predict mpg based on all other variables. The steps include loading the data, estimating the model, selecting the best model using AIC and BIC, and visualizing the residuals.
Step 1: Load Data
Code
# Load necessary libraries
pacman::p_load(olsrr, ggplot2)
# Load the mtcars dataset
data(mtcars)Step 2: Estimate the Full Model
Estimate a multiple linear regression model with mpg as the dependent variable and all other variables as independent variables.
Code
# Fit the full multiple linear regression model
full_model <- lm(mpg ~ ., data = mtcars)
# View the summary of the full model
summary(full_model)
Call:
lm(formula = mpg ~ ., data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4506 -1.6044 -0.1196 1.2193 4.6271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.30337 18.71788 0.657 0.5181
cyl -0.11144 1.04502 -0.107 0.9161
disp 0.01334 0.01786 0.747 0.4635
hp -0.02148 0.02177 -0.987 0.3350
drat 0.78711 1.63537 0.481 0.6353
wt -3.71530 1.89441 -1.961 0.0633 .
qsec 0.82104 0.73084 1.123 0.2739
vs 0.31776 2.10451 0.151 0.8814
am 2.52023 2.05665 1.225 0.2340
gear 0.65541 1.49326 0.439 0.6652
carb -0.19942 0.82875 -0.241 0.8122
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
Step 3: Model Selection Using AIC and BIC
Use the olsrr package to perform model selection based on AIC and BIC criteria.
AIC Model
Code
# Forward selection based on AIC
model_aic <- ols_step_both_aic(full_model, details = FALSE)
model_aic
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 115.061 0.00000 0.00000
1 wt (+) 166.029 170.427 74.373 0.75283 0.74459
2 cyl (+) 156.010 161.873 66.190 0.83023 0.81852
3 hp (+) 155.477 162.805 66.696 0.84315 0.82634
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.918 RMSE 2.349
R-Squared 0.843 MSE 5.519
Adj. R-Squared 0.826 Coef. Var 12.501
Pred R-Squared 0.796 AIC 155.477
MAE 1.845 SBC 162.805
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 949.427 3 316.476 50.171 0.0000
Residual 176.621 28 6.308
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 38.752 1.787 21.687 0.000 35.092 42.412
wt -3.167 0.741 -0.514 -4.276 0.000 -4.684 -1.650
cyl -0.942 0.551 -0.279 -1.709 0.098 -2.070 0.187
hp -0.018 0.012 -0.205 -1.519 0.140 -0.042 0.006
----------------------------------------------------------------------------------------
BIC Model
Code
# Backward selection based on BIC
model_bic <- ols_step_both_sbc(full_model, details = FALSE)
model_bic
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 115.061 0.00000 0.00000
1 wt (+) 166.029 170.427 74.373 0.75283 0.74459
2 cyl (+) 156.010 161.873 66.190 0.83023 0.81852
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.911 RMSE 2.444
R-Squared 0.830 MSE 5.974
Adj. R-Squared 0.819 Coef. Var 12.780
Pred R-Squared 0.790 AIC 156.010
MAE 1.921 SBC 161.873
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 934.875 2 467.438 70.908 0.0000
Residual 191.172 29 6.592
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 39.686 1.715 23.141 0.000 36.179 43.194
wt -3.191 0.757 -0.518 -4.216 0.000 -4.739 -1.643
cyl -1.508 0.415 -0.447 -3.636 0.001 -2.356 -0.660
----------------------------------------------------------------------------------------
Explanation:
- AIC (Akaike Information Criterion): Balances model fit with complexity. Lower AIC indicates a better model.
- BIC (Bayesian Information Criterion): Similar to AIC but imposes a heavier penalty for additional parameters, favoring simpler models. Lower BIC indicates a better model.
By comparing AIC and BIC, we can select a model that adequately fits the data without unnecessary complexity.
Step 4: Residuals vs. Fitted Plot
Plot the residuals versus fitted values to assess the assumptions of linearity and homoscedasticity.
Code
# Plot Residuals vs Fitted for the selected model (AIC)
ggplot(model_aic$model, aes(x = fitted(model_aic$model), y = resid(model_aic$model))) +
geom_point(color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Fitted (AIC Selected Model)",
x = "Fitted Values",
y = "Residuals") +
theme_minimal()Step 5: Plot Histograms of Residuals
Visualize the distribution of residuals to assess normality.
Code
# Histogram of Residuals for the selected model (AIC)
ggplot(model_aic$model, aes(x = resid(model_aic$model))) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (AIC Selected Model)",
x = "Residuals",
y = "Frequency") +
theme_minimal()5 Conclusion
- Takeaway 1: Multiple linear regression captures more complex relationships by including additional predictors, reducing omitted variable bias.
- Takeaway 2: Coefficients in multiple regression represent partial effects, showing how the dependent variable changes with one predictor while controlling for others.
- Takeaway 3: Model selection and fit criteria (like AIC, BIC, and stepwise methods) can guide which predictors to include, but practical judgment and theoretical considerations remain essential.